Goto

Collaborating Authors

 gradient inversion attack


Gradient Inversion in Federated Reinforcement Learning

He, Shenghong

arXiv.org Artificial Intelligence

Federated reinforcement learning (FRL) enables distributed learning of optimal policies while preserving local data privacy through gradient sharing.However, FRL faces the risk of data privacy leaks, where attackers exploit shared gradients to reconstruct local training data.Compared to traditional supervised federated learning, successful reconstruction in FRL requires the generated data not only to match the shared gradients but also to align with real transition dynamics of the environment (i.e., aligning with the real data transition distribution).To address this issue, we propose a novel attack method called Regularization Gradient Inversion Attack (RGIA), which enforces prior-knowledge-based regularization on states, rewards, and transition dynamics during the optimization process to ensure that the reconstructed data remain close to the true transition distribution.Theoretically, we prove that the prior-knowledge-based regularization term narrows the solution space from a broad set containing spurious solutions to a constrained subset that satisfies both gradient matching and true transition dynamics.Extensive experiments on control tasks and autonomous driving tasks demonstrate that RGIA can effectively constrain reconstructed data transition distributions and thus successfully reconstruct local private data.


Privacy in Federated Learning with Spiking Neural Networks

Aksu, Dogukan, del Rincon, Jesus Martinez, Alouani, Ihsen

arXiv.org Artificial Intelligence

Spiking neural networks (SNNs) have emerged as prominent candidates for embedded and edge AI. Their inherent low power consumption makes them far more efficient than conventional ANNs in scenarios where energy budgets are tightly constrained. In parallel, federated learning (FL) has become the prevailing training paradigm in such settings, enabling on-device learning while limiting the exposure of raw data. However, gradient inversion attacks represent a critical privacy threat in FL, where sensitive training data can be reconstructed directly from shared gradients. While this vulnerability has been widely investigated in conventional ANNs, its implications for SNNs remain largely unexplored. In this work, we present the first comprehensive empirical study of gradient leakage in SNNs across diverse data domains. SNNs are inherently non-differentiable and are typically trained using surrogate gradients, which we hypothesized would be less correlated with the original input and thus less informative from a privacy perspective. To investigate this, we adapt different gradient leakage attacks to the spike domain. Our experiments reveal a striking contrast with conventional ANNs: whereas ANN gradients reliably expose salient input content, SNN gradients yield noisy, temporally inconsistent reconstructions that fail to recover meaningful spatial or temporal structure. These results indicate that the combination of event-driven dynamics and surrogate-gradient training substantially reduces gradient informativeness. To the best of our knowledge, this work provides the first systematic benchmark of gradient inversion attacks for spiking architectures, highlighting the inherent privacy-preserving potential of neuromorphic computation.


On the Detectability of Active Gradient Inversion Attacks in Federated Learning

Carletti, Vincenzo, Foggia, Pasquale, Mazzocca, Carlo, Parrella, Giuseppe, Vento, Mario

arXiv.org Artificial Intelligence

One of the key advantages of Federated Learning (FL) is its ability to collaboratively train a Machine Learning (ML) model while keeping clients' data on-site. However, this can create a false sense of security. Despite not sharing private data increases the overall privacy, prior studies have shown that gradients exchanged during the FL training remain vulnerable to Gradient Inversion Attacks (GIAs). These attacks allow reconstructing the clients' local data, breaking the privacy promise of FL. GIAs can be launched by either a passive or an active server. In the latter case, a malicious server manipulates the global model to facilitate data reconstruction. While effective, earlier attacks falling under this category have been demonstrated to be detectable by clients, limiting their real-world applicability. Recently, novel active GIAs have emerged, claiming to be far stealthier than previous approaches. This work provides the first comprehensive analysis of these claims, investigating four state-of-the-art GIAs. We propose novel lightweight client-side detection techniques, based on statistically improbable weight structures and anomalous loss and gradient dynamics. Extensive evaluation across several configurations demonstrates that our methods enable clients to effectively detect active GIAs without any modifications to the FL training protocol.



From Research to Reality: Feasibility of Gradient Inversion Attacks in Federated Learning

Valadi, Viktor, Åkesson, Mattias, Östman, Johan, Toor, Salman, Hellander, Andreas

arXiv.org Artificial Intelligence

Gradient inversion attacks have garnered attention for their ability to compromise privacy in federated learning. However, many studies consider attacks with the model in inference mode, where training-time behaviors like dropout are disabled and batch normalization relies on fixed statistics. In this work, we systematically analyze how architecture and training behavior affect vulnerability, including the first in-depth study of inference-mode clients, which we show dramatically simplifies inversion. To assess attack feasibility under more realistic conditions, we turn to clients operating in standard training mode. In this setting, we find that successful attacks are only possible when several architectural conditions are met simultaneously: models must be shallow and wide, use skip connections, and, critically, employ pre-activation normalization. We introduce two novel attacks against models in training-mode with varying attacker knowledge, achieving state-of-the-art performance under realistic training conditions. We extend these efforts by presenting the first attack on a production-grade object-detection model. Here, to enable any visibly identifiable leakage, we revert to the lenient inference mode setting and make multiple architectural modifications to increase model vulnerability, with the extent of required changes highlighting the strong inherent robustness of such architectures. We conclude this work by offering the first comprehensive mapping of settings, clarifying which combinations of architectural choices and operational modes meaningfully impact privacy. Our analysis provides actionable insight into when models are likely vulnerable, when they appear robust, and where subtle leakage may persist. Together, these findings reframe how gradient inversion risk should be assessed in future research and deployment scenarios.



Evaluating Selective Encryption Against Gradient Inversion Attacks

Gu, Jiajun, Yao, Yuhang, Wang, Shuaiqi, Joe-Wong, Carlee

arXiv.org Artificial Intelligence

Gradient inversion attacks pose significant privacy threats to distributed training frameworks such as federated learning, enabling malicious parties to reconstruct sensitive local training data from gradient communications between clients and an aggregation server during the aggregation process. While traditional encryption-based defenses, such as homo-morphic encryption, offer strong privacy guarantees without compromising model utility, they often incur prohibitive computational overheads. To mitigate this, selective encryption has emerged as a promising approach, encrypting only a subset of gradient data based on the data's significance under a certain metric. However, there have been few systematic studies on how to specify this metric in practice. This paper systematically evaluates selective encryption methods with different significance metrics against state-of-the-art attacks. Our findings demonstrate the feasibility of selective encryption in reducing computational overhead while maintaining resilience against attacks. We propose a distance-based significance analysis framework that provides theoretical foundations for selecting critical gradient elements for encryption. Through extensive experiments on different model architectures (LeNet, CNN, BERT, GPT-2) and attack types, we identify gradient magnitude as a generally effective metric for protection against optimization-based gradient inversions. However, we also observe that no single selective encryption strategy is universally optimal across all attack scenarios, and we provide guidelines for choosing appropriate strategies for different model architectures and privacy requirements.


Defending Against Gradient Inversion Attacks for Biomedical Images via Learnable Data Perturbation

Jiang, Shiyi, Firouzi, Farshad, Chakrabarty, Krishnendu

arXiv.org Artificial Intelligence

The increasing need for sharing healthcare data and collaborating on clinical research has raised privacy concerns. Health information leakage due to malicious attacks can lead to serious problems such as misdiagnoses and patient identification issues. Privacy-preserving machine learning (PPML) and privacy-enhancing technologies, particularly federated learning (FL), have emerged in recent years as innovative solutions to balance privacy protection with data utility; however, they also suffer from inherent privacy vulnerabilities. Gradient inversion attacks constitute major threats to data sharing in federated learning. Researchers have proposed many defenses against gradient inversion attacks. However, current defense methods for healthcare data lack generalizability, i.e., existing solutions may not be applicable to data from a broader range of populations. In addition, most existing defense methods are tested using non-healthcare data, which raises concerns about their applicability to real-world healthcare systems. In this study, we present a defense against gradient inversion attacks in federated learning. We achieve this using latent data perturbation and minimax optimization, utilizing both general and medical image datasets. Our method is compared to two baselines, and the results show that our approach can outperform the baselines with a reduction of 12.5% in the attacker's accuracy in classifying reconstructed images. The proposed method also yields an increase of over 12.4% in Mean Squared Error (MSE) between the original and reconstructed images at the same level of model utility of around 90% client classification accuracy. The results suggest the potential of a generalizable defense for healthcare data.


GRAIN: Exact Graph Reconstruction from Gradients

Drencheva, Maria, Petrov, Ivo, Baader, Maximilian, Dimitrov, Dimitar I., Vechev, Martin

arXiv.org Artificial Intelligence

Federated learning claims to enable collaborative model training among multiple clients with data privacy by transmitting gradient updates instead of the actual client data. However, recent studies have shown the client privacy is still at risk due to the, so called, gradient inversion attacks which can precisely reconstruct clients' text and image data from the shared gradient updates. While these attacks demonstrate severe privacy risks for certain domains and architectures, the vulnerability of other commonly-used data types, such as graph-structured data, remain under-explored. To bridge this gap, we present GRAIN, the first exact gradient inversion attack on graph data in the honest-but-curious setting that recovers both the structure of the graph and the associated node features. Concretely, we focus on Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT) -- two of the most widely used frameworks for learning on graphs. Our method first utilizes the low-rank structure of GNN gradients to efficiently reconstruct and filter the client subgraphs which are then joined to complete the input graph. We evaluate our approach on molecular, citation, and social network datasets using our novel metric. We show that GRAIN reconstructs up to 80% of all graphs exactly, significantly outperforming the baseline, which achieves up to 20% correctly positioned nodes.


CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling

Zhang, Kaiyuan, Cheng, Siyuan, Shen, Guangyu, Ribeiro, Bruno, An, Shengwei, Chen, Pin-Yu, Zhang, Xiangyu, Li, Ninghui

arXiv.org Artificial Intelligence

Federated learning collaboratively trains a neural network on a global server, where each local client receives the current global model weights and sends back parameter updates (gradients) based on its local private data. The process of sending these model updates may leak client's private data information. Existing gradient inversion attacks can exploit this vulnerability to recover private training instances from a client's gradient vectors. Recently, researchers have proposed advanced gradient inversion techniques that existing defenses struggle to handle effectively. In this work, we present a novel defense tailored for large neural network models. Our defense capitalizes on the high dimensionality of the model parameters to perturb gradients within a subspace orthogonal to the original gradient. By leveraging cold posteriors over orthogonal subspaces, our defense implements a refined gradient update mechanism. This enables the selection of an optimal gradient that not only safeguards against gradient inversion attacks but also maintains model utility. We conduct comprehensive experiments across three different datasets and evaluate our defense against various state-of-the-art attacks and defenses. Code is available at https://censor-gradient.github.io.